tsinghua university
AI Models Are Starting to Learn by Asking Themselves Questions
An AI model that learns without human input--by posing interesting queries for itself--might point the way to superintelligence. Even the smartest artificial intelligence models are essentially copycats. They learn either by consuming examples of human work or by trying to solve problems that have been set for them by human instructors. But perhaps AI can, in fact, learn in a more human way--by figuring out interesting questions to ask itself and attempting to find the right answer. A project from Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI), and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code.
- North America > United States > Pennsylvania (0.25)
- Asia > China > Beijing > Beijing (0.25)
- North America > United States > North Carolina (0.05)
- (5 more...)
- Information Technology (1.00)
- Education (0.90)
Controllable risk scenario generation from human crash data for autonomous vehicle testing
Lu, Qiujing, Wang, Xuanhan, Yuan, Runze, Lu, Wei, Gong, Xinyi, Feng, Shuo
Ensuring the safety of autonomous vehicles (AV) requires rigorous testing under both everyday driving and rare, safety-critical conditions. A key challenge lies in simulating environment agents, including background vehicles (BVs) and vulnerable road users (VRUs), that behave realistically in nominal traffic while also exhibiting risk-prone behaviors consistent with real-world accidents. We introduce Controllable Risk Agent Generation (CRAG), a framework designed to unify the modeling of dominant nominal behaviors and rare safety-critical behaviors. CRAG constructs a structured latent space that disentangles normal and risk-related behaviors, enabling efficient use of limited crash data. By combining risk-aware latent representations with optimization-based mode-transition mechanisms, the framework allows agents to shift smoothly and plausibly from safe to risk states over extended horizons, while maintaining high fidelity in both regimes. Extensive experiments show that CRAG improves diversity compared to existing baselines, while also enabling controllable generation of risk scenarios for targeted and efficient evaluation of AV robustness.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- (6 more...)
- Transportation > Ground > Road (0.93)
- Information Technology (0.68)
- Automobiles & Trucks (0.68)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
TaskSense: Cognitive Chain Modeling and Difficulty Estimation for GUI Tasks
Yin, Yiwen, Hu, Zhian, Xu, Xiaoxi, Yu, Chun, Wu, Xintong, Fan, Wenyu, Shi, Yuanchun
Measuring GUI task difficulty is crucial for user behavior analysis and agent capability evaluation. Yet, existing benchmarks typically quantify difficulty based on motor actions (e.g., step counts), overlooking the cognitive demands underlying task completion. In this work, we propose Cognitive Chain, a novel framework that models task difficulty from a cognitive perspective. A cognitive chain decomposes the cognitive processes preceding a motor action into a sequence of cognitive steps (e.g., finding, deciding, computing), each with a difficulty index grounded in information theories. We develop an LLM-based method to automatically extract cognitive chains from task execution traces. Validation with linear regression shows that our estimated cognitive difficulty correlates well with user completion time (step-level R-square=0.46 after annotation). Assessment of state-of-the-art GUI agents shows reduced success on cognitively demanding tasks, revealing capability gaps and Human-AI consistency patterns. We conclude by discussing potential applications in agent training, capability assessment, and human-agent delegation optimization.
- North America > United States > Washington > King County > Seattle (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Workflow (1.00)
- Research Report (1.00)
Flight Dynamics to Sensing Modalities: Exploiting Drone Ground Effect for Accurate Edge Detection
Zhao, Chenyu, Xu, Jingao, Ruan, Ciyu, Wang, Haoyang, Wang, Shengbo, Li, Jiaqi, Zha, Jirong, Hong, Weijie, Yang, Zheng, Liu, Yunhao, Zhang, Xiao-Ping, Chen, Xinlei
Drone-based rapid and accurate environmental edge detection is highly advantageous for tasks such as disaster relief and autonomous navigation. Current methods, using radars or cameras, raise deployment costs and burden lightweight drones with high computational demands. In this paper, we propose AirTouch, a system that transforms the ground effect from a stability "foe" in traditional flight control views, into a "friend" for accurate and efficient edge detection. Our key insight is that analyzing drone basic attitude sensor readings and flight commands allows us to detect ground effect changes. Such changes typically indicate the drone flying over a boundary of two materials, making this information valuable for edge detection. We approach this insight through theoretical analysis, algorithm design, and implementation, fully leveraging the ground effect as a new sensing modality without compromising drone flight stability, thereby achieving accurate and efficient scene edge detection. We also compare this new sensing modality with vision-based methods to clarify its exclusive advantages in resource efficiency and detection capability. Extensive evaluations demonstrate that our system achieves a high detection accuracy with mean detection distance errors of 0.051m, outperforming the baseline method performance by 86%. With such detection performance, our system requires only 43 mW power consumption, contributing to this new sensing modality for low-cost and highly efficient edge detection.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Beijing > Beijing (0.04)
- (4 more...)
- Leisure & Entertainment > Sports > Motorsports (1.00)
- Education (1.00)
- Aerospace & Defense (0.93)
- Transportation > Air (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Leveraging Multi-Source Textural UGC for Neighbourhood Housing Quality Assessment: A GPT-Enhanced Framework
Hong, Qiyuan, Zhao, Huimin, Long, Ying
This study leverages GPT-4o to assess neighbourhood housing quality using multi-source textural user-generated content (UGC) from Dianping, Weibo, and the Government Message Board. The analysis involves filtering relevant texts, extracting structured evaluation units, and conducting sentiment scoring. A refined housing quality assessment system with 46 indicators across 11 categories was developed, highlighting an objective-subjective method gap and platform-specific differences in focus. GPT-4o outperformed rule-based and BERT models, achieving 92.5% accuracy in fine-tuned settings. The findings underscore the value of integrating UGC and GPT-driven analysis for scalable, resident-centric urban assessments, offering practical insights for policymakers and urban planners.
- Education (1.00)
- Government (0.90)
- Health & Medicine > Therapeutic Area (0.51)
Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design
Hou, Xiangwang, Wang, Jingjing, Guan, Fangming, Du, Jun, Jiang, Chunxiao, Ren, Yong
--Emerging real-time computer vision (CV) applications on wireless edge devices demand energy-efficient and privacy-preserving learning. Federated learning (FL) enables on-device training without raw data sharing, yet remains challenging in resource-constrained environments due to energy-intensive computation and communication, as well as limited and non-i.i.d. We propose FedDPQ, an ultra energy-efficient FL framework for real-time CV over unreliable wireless networks. FedDPQ integrates diffusion-based data augmentation, model pruning, communication quantization, and transmission power control to enhance training efficiency. It expands local datasets using synthetic data, reduces computation through pruning, compresses updates via quantization, and mitigates transmission outages with adaptive power control. We further derive a closed-form energy-convergence model capturing the coupled impact of these components, and develop a Bayesian optimization(BO)- based algorithm to jointly tune data augmentation strategy, pruning ratio, quantization level, and power control. This work of Xiangwang Hou was supported by the National Natural Science Foundation of China under grant No. 623B2060. This work of Jingjing Wang was partly supported by the National Natural Science Foundation of China under Grant No. 62222101 and No. U24A20213, partly supported by the Beijing Natural Science Foundation under Grants No. L232043 and No. L222039, partly supported by the Natural Science Foundation of Zhejiang Province under Grant No. LMS25F010007 and partly supported by the Fundamental Research Funds for the Central Universities. This work of Jun Du was partly supported by the National Natural Science Foundation China under Grants No. 62422109 and No.U23A20281.
- Asia > China > Beijing > Beijing (0.25)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- (6 more...)
- Education (1.00)
- Telecommunications (0.93)
- Information Technology (0.67)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Game-Theoretic Modeling of Vehicle Unprotected Left Turns Considering Drivers' Bounded Rationality
Lian, Yuansheng, Zhang, Ke, Li, Meng, Li, Shen
Game-Theoretic Modeling of V ehicle Unprotected Left Turns Considering Drivers' Bounded Rationality Abstract --Modeling the decision-making behavior of vehicles presents unique challenges, particularly during unprotected left turns at intersections, where the uncertainty of human drivers is especially pronounced. In this context, connected autonomous vehicle (CA V) technology emerges as a promising avenue for effectively managing such interactions while ensuring safety and efficiency. Traditional approaches, often grounded in game theory assumptions of perfect rationality, may inadequately capture the complexities of real-world scenarios and drivers' decision-making errors. T o fill this gap, we propose a novel decision-making model for vehicle unprotected left-turn scenarios, integrating game theory with considerations for drivers' bounded rationality. Our model, formulated as a two-player normal-form game solved by a quantal response equilibrium (QRE), offers a more nuanced depiction of driver decision-making processes compared to Nash equilibrium (NE) models. Leveraging an Expectation-Maximization (EM) algorithm coupled with a subtle neural network trained on precise microscopic vehicle trajectory data, we optimize model parameters to accurately reflect drivers' interaction-aware bounded rationality and driving styles. Through comprehensive simulation experiments, we demonstrate the efficacy of our proposed model in capturing the interaction-aware bounded rationality and decision tendencies between players. The proposed model proves to be more realistic and efficient than NE models in unprotected left-turn scenarios. Our findings contribute valuable insights into the vehicle decision-making behaviors with bounded rationality, thereby informing the development of more robust and realistic autonomous driving systems. Connected autonomous vehicle (CA V) refers to a vehicle that can operate autonomously and communicate with other vehicles and infrastructure to enhance safety and efficiency. This work was supported by grants from National Key Research and Development Program of China (2022YFB2503200), Tsinghua University-Mercedes Benz Joint Institute for Sustainable Mobility. Consequently, there arises an urgent need to develop models that enable the operation of CA Vs within mixed traffic environments, enabling them to anticipate the intentions of surrounding human drivers and make human-like decisions based on these expectations and feedback. In the context of mixed traffic environments, one of the most prevalent scenarios entails vehicles executing unprotected left turns at signalized intersections.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (3 more...)
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
Understanding Knowledge Transferability for Transfer Learning: A Survey
Wang, Haohua, Wang, Jingge, Zhao, Zijie, Tan, Yang, Wu, Yanru, Liu, Hanbing, Yang, Jingyun, Zhang, Enming, Chen, Xiangyu, Rong, Zhengze, Guo, Shanxin, Li, Yang
Transfer learning has become an essential paradigm in artificial intelligence, enabling the transfer of knowledge from a source task to improve performance on a target task. This approach, particularly through techniques such as pretraining and fine-tuning, has seen significant success in fields like computer vision and natural language processing. However, despite its widespread use, how to reliably assess the transferability of knowledge remains a challenge. Understanding the theoretical underpinnings of each transferability metric is critical for ensuring the success of transfer learning. In this survey, we provide a unified taxonomy of transferability metrics, categorizing them based on transferable knowledge types and measurement granularity. This work examines the various metrics developed to evaluate the potential of source knowledge for transfer learning and their applicability across different learning paradigms emphasizing the need for careful selection of these metrics. By offering insights into how different metrics work under varying conditions, this survey aims to guide researchers and practitioners in selecting the most appropriate metric for specific applications, contributing to more efficient, reliable, and trustworthy AI systems. Finally, we discuss some open challenges in this field and propose future research directions to further advance the application of transferability metrics in trustworthy transfer learning.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.06)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Overview (1.00)
- Research Report (0.82)
NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation
Yang, Yuxiao, Li, Peihao, Zhang, Yuhong, Lu, Junzhe, He, Xianglong, Qin, Minghan, Wang, Weitao, Wang, Haoqian
3D AI-generated content (AIGC) has made it increasingly accessible for anyone to become a 3D content creator. While recent methods leverage Score Distillation Sampling to distill 3D objects from pretrained image diffusion models, they often suffer from inadequate 3D priors, leading to insufficient multi-view consistency. In this work, we introduce NOVA3D, an innovative single-image-to-3D generation framework. Our key insight lies in leveraging strong 3D priors from a pretrained video diffusion model and integrating geometric information during multi-view video fine-tuning. To facilitate information exchange between color and geometric domains, we propose the Geometry-Temporal Alignment (GTA) attention mechanism, thereby improving generalization and multi-view consistency. Moreover, we introduce the de-conflict geometry fusion algorithm, which improves texture fidelity by addressing multi-view inaccuracies and resolving discrepancies in pose alignment. Extensive experiments validate the superiority of NOVA3D over existing baselines.
LF-GNSS: Towards More Robust Satellite Positioning with a Hard Example Mining Enhanced Learning-Filtering Deep Fusion Framework
--Global Navigation Satellite System (GNSS) is essential for autonomous driving systems, unmanned vehicles, and various location-based technologies, as it provides the precise geospatial information necessary for navigation and situational awareness. However, its performance is often degraded by Non-Line-Of-Sight (NLOS) and multipath effects, especially in urban environments. Recently, Artificial Intelligence (AI) has been driving innovation across numerous industries, introducing novel solutions to mitigate the challenges in satellite positioning. This paper presents a learning-filtering deep fusion framework for satellite positioning, termed LF-GNSS. The framework utilizes deep learning networks to intelligently analyze the signal characteristics of satellite observations, enabling the adaptive construction of observation noise covariance matrices and compensated innovation vectors for Kalman filter input. A dynamic hard example mining technique is incorporated to enhance model robustness by prioritizing challenging satellite signals during training. Additionally, we introduce a novel feature representation based on Dilution of Precision (DOP) contributions, which helps to more effectively characterize the signal quality of individual satellites and improve measurement weighting. LF-GNSS has been validated on both public and private datasets, demonstrating superior positioning accuracy compared to traditional methods and other learning-based solutions.